Pre-Selection in Cluster Lasso Methods for Correlated Variable Selection in High-Dimensional Linear Models

نویسندگان

  • Niharika Gauraha
  • Swapan K. Parui
چکیده

We consider variable selection problems in high dimensional sparse regression models with strongly correlated variables. To handle correlated variables, the concept of clustering or grouping variables and then pursuing model fitting is widely accepted. When the dimension is very high, finding an appropriate group structure is as difficult as the original problem. We propose to use Elastic-net as a pre-selection step for Cluster Lasso methods (i.e. Cluster Group Lasso and Cluster Representative Lasso). The Elastic-net selects correlated relevant variables, but it fails to reveal the correlation structure among the active variables. We use cluster Lasso methods to address shortcoming of the Elastic-net, and the Elasticnet is used to provide reduced feature set for the cluster Lasso methods. We theoretically explore, the group selection consistency of the proposed combination of algorithms under various conditions, i.e. Irrepresentable Condition (IC), Elastic-net Irrepresentable Condition (EIC) and Group Irrepresentable Condition (GIC). We support the theory using simulated and real dataset examples.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Efficient Clustering of Correlated Variables and Variable Selection in High-Dimensional Linear Models

In this paper, we introduce Adaptive Cluster Lasso(ACL) method for variable selection in high dimensional sparse regression models with strongly correlated variables. To handle correlated variables, the concept of clustering or grouping variables and then pursuing model fitting is widely accepted. When the dimension is very high, finding an appropriate group structure is as difficult as the ori...

متن کامل

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables

We consider the problem of model selection and estimation in sparse high dimensional linear regression models with strongly correlated variables. First, we study the theoretical properties of the dual Lasso solution, and we show that joint consideration of the Lasso primal and its dual solutions are useful for selecting correlated active variables. Second, we argue that correlation among active...

متن کامل

Variable selection in linear models

Variable selection in linear models is essential for improved inference and interpretation, an activity which has become even more critical for high dimensional data. In this article, we provide a selective review of some classical methods including Akaike information criterion, Bayesian information criterion, Mallow’s Cp and risk inflation criterion, as well as regularization methods including...

متن کامل

FIRST: Combining forward iterative selection and shrinkage in high dimensional sparse linear regression

We propose a new class of variable selection techniques for regression in high dimensional linear models based on a forward selection version of the LASSO, adaptive LASSO or elastic net, respectively to be called as forward iterative regression and shrinkage technique (FIRST), adaptive FIRST and elastic FIRST. These methods seem to work effectively for extremely sparse high dimensional linear m...

متن کامل

Robust high-dimensional semiparametric regression using optimized differencing method applied to the vitamin B2 production data

Background and purpose: By evolving science, knowledge, and technology, we deal with high-dimensional data in which the number of predictors may considerably exceed the sample size. The main problems with high-dimensional data are the estimation of the coefficients and interpretation. For high-dimension problems, classical methods are not reliable because of a large number of predictor variable...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017